Social Determinants in Diabetes in Pennsylvania

BMIN503/EPID600 Final Project

Author

Karen Tang


1 Overview

The project is to analyze the correlation between diabetes and several social determinants, including race, income, insurance, and clinical distance. The goal is to understand how these factors influence the prevalence and management of diabetes, providing insights that can inform public strategies, healthcare policies, and intervention programs.

I spoke to Dr. Richard Tsui abotu my project, he guided me to choose a specific social determinants that directly correlates to the disease I want to learn more about.

2 Introduction

According to the CDC, in 2020, 38.4 million people in the United States of all ages had diabetes. Diabetes was the eighth leading cause of death in the United States. In an article called “Overview of Social Determinants of Health in the Development of Diabetes” from the Diabetes Journals stated that diabetes has a long-standing, well-documented socioeconomic and racial/ethnic inequalities in disease prevalence and incidence, morbidity and mortality. Higher diabetes prevalence is associated with lower education, lower income, and non-White race/ethnicity.

World Health Organization (WHO) Commission defined Social Determinants of Health (SDOH) as “the conditions in which people are born, grow, live, work and age, and the wider set of forces and systems shaping the conditions of daily life”. SDOH attributed between 30%-55% of health outcome and they viewed as the main driver of avoidable health inequities. Due to the association between social determinants of health and diabetes, I would like to learn and conduct an analysis on the following factors: race, income, and insurance, distance to clinics.

3 Methods

The dataset will be used in this analysis are: #1. Diabetes = BRFSS dataset in 2022 #2. Social determinants = the Agency for Healthcare Research and Quality’s Social Determinants of Health Database, the data I used is the Census Tract of 2020.

#Loading the necessary packages
library(readxl)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.5.1     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(sf)
Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
library(leaflet)
library(maps)

Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map
library(readr)
library(knitr)
library(RColorBrewer)
#Loading diabetes dataset
diabetes <- read_csv("C:/Users/nghik/Downloads/ExportCSV.csv")
Rows: 235 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): LocationAbbr, LocationDesc, Class, Topic, Indicator, Response, Dat...
dbl  (7): ID, Year, Low_Confidence_Limit, High_Confidence_Limit, Sample_Size...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Only choosing dataset for PA and WV
PA_WV_diabetes <- diabetes %>%
  filter(LocationDesc %in% c("Pennsylvania", "West Virginia"))

#Having the variable Data_Value as numeric in order to graph/compare later
diabetes$Data_Value <- as.numeric(diabetes$Data_Value)
Warning: NAs introduced by coercion
#Looking for the state with the highest diabetes diagnosis, excluding U.S. islands
most_diabetes <- diabetes %>%
  filter(Response == "Yes", 
         !grepl("median", LocationDesc, ignore.case = TRUE), 
         !LocationDesc %in% c("Guam", "Puerto Rico", "Virgin Islands")) %>%
  select(LocationDesc, Response, Data_Value) %>%
  arrange(desc(Data_Value)) %>%
  head(1)

# Print the result
print(most_diabetes)
# A tibble: 1 × 3
  LocationDesc  Response Data_Value
  <chr>         <chr>         <dbl>
1 West Virginia Yes            17.4

From the output above, the state with the highest population diagnosed with diabetes is West Virginia.

Showing the percentage of population with diabetes from Pennsylvania and West Virginia

print(PA_WV_diabetes %>%
        filter(Response == "Yes") %>%
        select(LocationDesc, Response, Data_Value))
# A tibble: 2 × 3
  LocationDesc  Response Data_Value
  <chr>         <chr>    <chr>     
1 Pennsylvania  Yes      11.5      
2 West Virginia Yes      17.4      

Providing a visualization on the percentage of population with diabetes based on all states in the United States

#Diabetes = yes data 
diabetes_yes <- diabetes %>%
  filter(Response == "Yes", 
         !grepl("median", LocationDesc, ignore.case = TRUE), 
         !LocationDesc %in% c("Guam", "Puerto Rico", "Virgin Islands"))

#Download counties data on every states in U.S.
counties1 <- counties(cb = TRUE, class = "sf")
Retrieving data for the year 2022

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
#Joining diabetes and counties to map
diabetes_map <- inner_join(diabetes_yes, counties1, by = c("LocationAbbr" = "STUSPS"))

#Making the diabetes as sf data to map
diabetes_map <- st_as_sf(diabetes_map)

#Create color palette based on the percentage of diabetes 
pal <- colorNumeric(
  palette = "YlOrRd",
  domain = diabetes_map$Data_Value
)
# Create a popup showing diabetes data
map_data <- diabetes_map %>%
  mutate(popup_info = paste0(
    "<b>State:</b> ", LocationAbbr, "<br>",
    "<b>White:</b> ", Data_Value, "<br>"
  ))

# Generate the map
leaflet(data = map_data) %>%
  addTiles() %>%
  addPolygons(
    fillColor = ~pal(Data_Value), 
    fillOpacity = 0.5,
    color = "black",
    weight = 1,
    popup = ~popup_info
  ) %>%
  fitBounds(
    lng1 = min(st_bbox(diabetes_map)$xmin), # Minimum longitude
    lat1 = min(st_bbox(diabetes_map)$ymin), # Minimum latitude
    lng2 = max(st_bbox(diabetes_map)$xmax), # Maximum longitude
    lat2 = max(st_bbox(diabetes_map)$ymax)  # Maximum latitude
  )
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'

#Let’s make a histogram of diabetes in both West Virginia and Pennsylvania to see the difference of diabetes prevalence between the two states.

#Making histogram
diabetes_state <- ggplot (data = PA_WV_diabetes, aes(x = Response, y= Data_Value, fill = LocationDesc)) +
  geom_bar (stat = "identity", position = "dodge") +
  scale_fill_discrete(name = "States") +
  labs( title = "Diabetes Numbers Based on States",
        x = "Diagnosis", y = "Percentage (%)")
  
print(diabetes_state)

Based on the histogram output, the percentage of No-diabetes is significantly higher than diabetes diagnosis. This means that people are being tested for diabetes and most tests are coming back negative. This can be due to people visiting their doctor regularly to have routine checkups.

Next, I will be preparing the datasets are analysis - this following steps were done beforehand to minimize file size because I had a hard time rendering the file

#Loading in the social determinants file social_determinants<- read_excel(“C:/Users/nghik/Downloads/sdoh_2020_tract_1_0.xlsx”, sheet = 2)

#Selecting data from just Pennsylvania and West Virginia PA_WV_social <- social_determinants %>% filter(STATE == c(“Pennsylvania”, “West Virginia”))

#writing the file and send it to my computer write.csv(PA_WV_social, “C:/Users/nghik/Downloads/PA_WV_social”)

The next steps are done to complete the analysis after the data clean up above

#Pulling PA_WV_social dataset in from my computer
PA_WV_social <- read_csv("C:/Users/nghik/Downloads/PA_WV_social")
New names:
• `` -> `...1`
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 1996 Columns: 330
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (3): STATE, COUNTY, REGION
dbl (325): ...1, YEAR, TRACTFIPS, COUNTYFIPS, STATEFIPS, TERRITORY, ACS_TOT_...
lgl   (2): ACS_MEDIAN_HH_INC_AIAN, ACS_MEDIAN_HH_INC_NHPI

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Selecting variables that will be use 
PA_WV_social_final <- PA_WV_social %>%
  select(c("TRACTFIPS", "STATE", "COUNTYFIPS", "STATEFIPS", #States identifiers
           "ACS_PCT_WHITE", "ACS_PCT_AIAN", "ACS_PCT_ASIAN", "ACS_PCT_BLACK", "ACS_PCT_HISPANIC", "ACS_PCT_NHPI", "ACS_PCT_WHITE", #race/ethnicity
           "ACS_PCT_MEDICAID_ANY","ACS_PCT_MEDICARE_ONLY","ACS_PCT_PRIVATE_ANY",  "ACS_PCT_PUBLIC_ONLY","ACS_PCT_UNINSURED",#Insurance
           "ACS_PCT_HH_INC_10000", "ACS_PCT_HH_INC_14999", "ACS_PCT_HH_INC_24999", "ACS_PCT_HH_INC_49999", "ACS_PCT_HH_INC_99999", "ACS_PCT_HH_INC_100000", #Income
           "POS_DIST_CLINIC_TRACT")) #Distance to the nearest clinic

#Downloading data with sf data in order to map
counties <- counties(state = c("WV", "PA"), cb = TRUE, class = "sf") #FIPS CODE: 54 = West Virginia, and 42 = Pennsylvania
Retrieving data for the year 2022
print(counties)
Simple feature collection with 122 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -82.64474 ymin: 37.20148 xmax: -74.68952 ymax: 42.26986
Geodetic CRS:  NAD83
First 10 features:
    STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID      NAME         NAMELSAD
106      42      045 01209177 0500000US42045 42045  Delaware  Delaware County
107      42      049 01209178 0500000US42049 42049      Erie      Erie County
108      42      059 01214033 0500000US42059 42059    Greene    Greene County
125      54      107 01560558 0500000US54107 54107      Wood      Wood County
141      54      031 01718557 0500000US54031 54031     Hardy     Hardy County
142      54      023 01697238 0500000US54023 54023     Grant     Grant County
230      42      071 01209181 0500000US42071 42071 Lancaster Lancaster County
231      42      043 01213667 0500000US42043 42043   Dauphin   Dauphin County
232      42      067 01209180 0500000US42067 42067   Juniata   Juniata County
233      42      027 01214720 0500000US42027 42027    Centre    Centre County
    STUSPS    STATE_NAME LSAD      ALAND     AWATER
106     PA  Pennsylvania   06  476068778   17591835
107     PA  Pennsylvania   06 2069143908 1966272916
108     PA  Pennsylvania   06 1491658043    5296803
125     WV West Virginia   06  949229778   26590767
141     WV West Virginia   06 1508190150    5624064
142     WV West Virginia   06 1236387824    7546060
230     PA  Pennsylvania   06 2444751632  103368819
231     PA  Pennsylvania   06 1359407361   86047350
232     PA  Pennsylvania   06 1013697486    5502986
233     PA  Pennsylvania   06 2867787787   11295681
                          geometry
106 MULTIPOLYGON (((-75.60153 3...
107 MULTIPOLYGON (((-80.51941 4...
108 MULTIPOLYGON (((-80.51942 3...
125 MULTIPOLYGON (((-81.75582 3...
141 MULTIPOLYGON (((-79.13226 3...
142 MULTIPOLYGON (((-79.48687 3...
230 MULTIPOLYGON (((-76.72162 4...
231 MULTIPOLYGON (((-77.02934 4...
232 MULTIPOLYGON (((-77.74677 4...
233 MULTIPOLYGON (((-78.37644 4...

With all the data ready, let’s dive into specific analysis. I will be making a race dataframe to have data on just data on race and conduct a histogram to see the break down of race from both states.

#Making a race dataframe
race <- PA_WV_social_final %>%
  select("TRACTFIPS", "STATE", "COUNTYFIPS", "STATEFIPS", #States identifiers
         "ACS_PCT_WHITE", #Percentage of population reported as White
         "ACS_PCT_AIAN", #Percentage of population reported as American Indian and Alaska Native
         "ACS_PCT_ASIAN", #Percentage of population reported as Asian
         "ACS_PCT_BLACK", #Percentage of population reported as Black
         "ACS_PCT_HISPANIC", #Percentage of population reported as Hispanic
         "ACS_PCT_NHPI" #Percentage of population reported as Native Hawaiian and Pacific Islander
         )

#Average from each race from both states
race_summary <- race %>%
  group_by(STATE) %>%
  summarise(Average_White = mean(ACS_PCT_WHITE, na.rm = TRUE),
            Average_AIAN = mean(ACS_PCT_AIAN, na.rm = TRUE),
            Average_Asian = mean(ACS_PCT_ASIAN, na.rm = TRUE),
            Average_Black = mean (ACS_PCT_BLACK, na.rm = TRUE),
            Average_Hispanic = mean (ACS_PCT_HISPANIC, na.rm = TRUE),
            Average_NHPI = mean (ACS_PCT_NHPI, na.rm = TRUE))
(race_summary)
# A tibble: 2 × 7
  STATE  Average_White Average_AIAN Average_Asian Average_Black Average_Hispanic
  <chr>          <dbl>        <dbl>         <dbl>         <dbl>            <dbl>
1 Penns…          79.1        0.167         3.21          11.7              7.38
2 West …          92.3        0.162         0.763          3.88             1.54
# ℹ 1 more variable: Average_NHPI <dbl>

Making the histogram of race

state_long <- race_summary %>%
  pivot_longer(
    cols = starts_with("Average_"),    # Columns to pivot
    names_to = "Race",             # New column for race
    values_to = "Percentage"       # New column for values
  ) %>%
  mutate(Race = gsub("Mean_", "", Race))  # Clean up race names

race_graph <- ggplot(state_long, aes(x = Race, y = Percentage, fill = STATE)) +
  geom_bar(stat = "identity", position = "dodge") +  # Grouped bars
  theme_minimal() +
  labs(title = "Racial Demographics Comparison by State",
    x = "Race Category",
    y = "Percentage (%)",
    fill = "State"
  ) +
  scale_fill_manual(values = c("Pennsylvania" = "blue", "West Virginia" = "red")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels

print(race_graph)

Based on the graph above, it looks like Pennsylvania is far more ethnical and racial diverse comparing to West Virginia, which could influence diabetes and healthcare access. The diversity in PA may lead to varies health risks and disparties, while WV, with more homogeneous population may face different but significant healthcare challenges. Next step, I will look into income as lower income often linked to poorer health outcomes.

Merging counties data and all interest variables to make maps

#Removing the two first number in COUNTY FIPS
counties$GEOID <- as.numeric(counties$GEOID)
#Joining the two dataset
map_need <- inner_join(counties, PA_WV_social_final, by = c("GEOID" = "COUNTYFIPS"))
summary(map_need)
   STATEFP            COUNTYFP           COUNTYNS           AFFGEOID        
 Length:1996        Length:1996        Length:1996        Length:1996       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
     GEOID           NAME             NAMELSAD            STUSPS         
 Min.   :42001   Length:1996        Length:1996        Length:1996       
 1st Qu.:42029   Class :character   Class :character   Class :character  
 Median :42079   Mode  :character   Mode  :character   Mode  :character  
 Mean   :43702                                                           
 3rd Qu.:42105                                                           
 Max.   :54109                                                           
                                                                         
  STATE_NAME            LSAD               ALAND               AWATER         
 Length:1996        Length:1996        Min.   :2.140e+08   Min.   :3.027e+05  
 Class :character   Class :character   1st Qu.:9.577e+08   1st Qu.:1.142e+07  
 Mode  :character   Mode  :character   Median :1.565e+09   Median :2.161e+07  
                                       Mean   :1.538e+09   Mean   :6.080e+07  
                                       3rd Qu.:2.069e+09   3rd Qu.:3.705e+07  
                                       Max.   :3.183e+09   Max.   :1.966e+09  
                                                                              
   TRACTFIPS            STATE             STATEFIPS     ACS_PCT_WHITE   
 Min.   :4.200e+10   Length:1996        Min.   :42.00   Min.   :  0.00  
 1st Qu.:4.203e+10   Class :character   1st Qu.:42.00   1st Qu.: 76.35  
 Median :4.208e+10   Mode  :character   Median :42.00   Median : 90.25  
 Mean   :4.370e+10                      Mean   :43.64   Mean   : 80.88  
 3rd Qu.:4.211e+10                      3rd Qu.:42.00   3rd Qu.: 96.17  
 Max.   :5.411e+10                      Max.   :54.00   Max.   :100.00  
                                                        NA's   :20      
  ACS_PCT_AIAN    ACS_PCT_ASIAN    ACS_PCT_BLACK    ACS_PCT_HISPANIC
 Min.   :0.0000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.: 0.510   1st Qu.: 0.900  
 Median :0.0000   Median : 0.770   Median : 2.515   Median : 2.610  
 Mean   :0.1664   Mean   : 2.872   Mean   :10.606   Mean   : 6.580  
 3rd Qu.:0.0425   3rd Qu.: 3.310   3rd Qu.:10.053   3rd Qu.: 6.133  
 Max.   :7.9300   Max.   :42.640   Max.   :98.190   Max.   :88.870  
 NA's   :20       NA's   :20       NA's   :20       NA's   :20      
  ACS_PCT_NHPI     ACS_PCT_MEDICAID_ANY ACS_PCT_MEDICARE_ONLY
 Min.   :0.00000   Min.   : 0.000       Min.   : 0.000       
 1st Qu.:0.00000   1st Qu.: 7.798       1st Qu.: 3.530       
 Median :0.00000   Median :14.250       Median : 5.050       
 Mean   :0.02762   Mean   :17.662       Mean   : 5.474       
 3rd Qu.:0.00000   3rd Qu.:23.927       3rd Qu.: 6.862       
 Max.   :2.30000   Max.   :81.130       Max.   :38.000       
 NA's   :20        NA's   :24           NA's   :24           
 ACS_PCT_PRIVATE_ANY ACS_PCT_PUBLIC_ONLY ACS_PCT_UNINSURED ACS_PCT_HH_INC_10000
 Min.   :  0.00      Min.   : 0.00       Min.   : 0.000    Min.   : 0.00       
 1st Qu.: 53.45      1st Qu.:14.19       1st Qu.: 2.598    1st Qu.: 2.36       
 Median : 65.54      Median :21.18       Median : 4.600    Median : 4.63       
 Mean   : 63.24      Mean   :24.27       Mean   : 5.663    Mean   : 6.55       
 3rd Qu.: 75.12      3rd Qu.:31.20       3rd Qu.: 7.385    3rd Qu.: 8.61       
 Max.   :100.00      Max.   :81.13       Max.   :46.990    Max.   :52.72       
 NA's   :24          NA's   :24          NA's   :24        NA's   :27          
 ACS_PCT_HH_INC_14999 ACS_PCT_HH_INC_24999 ACS_PCT_HH_INC_49999
 Min.   :  0.000      Min.   : 0.000       Min.   : 0.00       
 1st Qu.:  1.640      1st Qu.: 5.350       1st Qu.:16.12       
 Median :  3.520      Median : 8.760       Median :21.86       
 Mean   :  4.582      Mean   : 9.576       Mean   :21.91       
 3rd Qu.:  6.160      3rd Qu.:12.750       3rd Qu.:27.17       
 Max.   :100.000      Max.   :48.570       Max.   :52.27       
 NA's   :27           NA's   :27           NA's   :27          
 ACS_PCT_HH_INC_99999 ACS_PCT_HH_INC_100000 POS_DIST_CLINIC_TRACT
 Min.   :  0.00       Min.   :  0.00        Min.   : 0.030       
 1st Qu.: 24.88       1st Qu.: 14.33        1st Qu.: 1.010       
 Median : 30.79       Median : 23.94        Median : 2.820       
 Mean   : 30.21       Mean   : 27.17        Mean   : 4.264       
 3rd Qu.: 35.78       3rd Qu.: 37.03        3rd Qu.: 6.122       
 Max.   :100.00       Max.   :100.00        Max.   :26.460       
 NA's   :27           NA's   :27                                 
          geometry   
 MULTIPOLYGON :1996  
 epsg:4269    :   0  
 +proj=long...:   0  
                     
                     
                     
                     

Let’s move to income now

income <- PA_WV_social_final %>%
  select(TRACTFIPS, STATE, COUNTYFIPS, STATEFIPS, #State identidiers
         ACS_PCT_HH_INC_10000,#percentage of household with income less than $10,000
         ACS_PCT_HH_INC_14999, #Percentage of household with income $10,000 and $14,999
         ACS_PCT_HH_INC_24999, #Percentage of household with income $15,000 and $24,999
         ACS_PCT_HH_INC_49999, #Percentage of household with income $25,000 and $49,999
         ACS_PCT_HH_INC_99999, #Percentage of houshold with income more than $100,000
         ACS_PCT_HH_INC_100000) #Percentage of household with income $50,000 and $99,999

#Average from each race from both states
income_summary <- income %>%
  group_by(STATE) %>%
  summarise(Average_10000 = mean(ACS_PCT_HH_INC_10000, na.rm = TRUE),
            Average_10000_14999= mean(ACS_PCT_HH_INC_14999, na.rm = TRUE),
            Average_15000_24999 = mean(ACS_PCT_HH_INC_24999, na.rm = TRUE),
            Average_25000_44999 = mean (ACS_PCT_HH_INC_49999, na.rm = TRUE),
            Average_50000_99999 = mean (ACS_PCT_HH_INC_99999, na.rm = TRUE),
            Average_100000 = mean (ACS_PCT_HH_INC_100000, na.rm = TRUE))

income_long <- income_summary %>%
  pivot_longer(
    cols = starts_with("Average_"),    # Columns to pivot
    names_to = "Income",             # New column for race
    values_to = "Percentage"       # New column for values
  ) %>%
  mutate(Income = gsub("Mean_", "", Income))  # Clean up race names

income_graph <- ggplot(income_long, aes(x = Income, y = Percentage, fill = STATE)) +
  geom_bar(stat = "identity", position = "dodge") +  # Grouped bars
  theme_minimal() +
  labs(
    title = "Income Comparison by State",
    x = "Income Category",
    y = "Percentage (%)",
    fill = "State"
  ) +
  scale_fill_manual(values = c("Pennsylvania" = "blue", "West Virginia" = "red")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels

print(income_graph)

With the graph above, it looks like Pennsylvania households have higher salary comparing to West Virginia, which may overall distribute to better access to healthcare and diabetes management. This income disparity often linked to difference in insurance coverage, as higher income are more likely to result in better access to private insurance or employer-sponsored plans, which may be the factor that reduce care barriers for PA residents compared to those in WV.

insurance <- PA_WV_social_final %>%
  select(TRACTFIPS, STATE, COUNTYFIPS, STATEFIPS, #State identidiers
         ACS_PCT_MEDICAID_ANY,
         ACS_PCT_MEDICARE_ONLY,
         ACS_PCT_PRIVATE_ANY,
         ACS_PCT_PUBLIC_ONLY,
         ACS_PCT_UNINSURED)

#Average from each race from both states
insurance_summary <- insurance %>%
  group_by(STATE) %>%
  summarise(AverageMedicaid = mean(ACS_PCT_MEDICAID_ANY, na.rm = TRUE),
            AverageMedicare= mean(ACS_PCT_MEDICARE_ONLY, na.rm = TRUE),
            AveragePrivate = mean(ACS_PCT_PRIVATE_ANY, na.rm = TRUE),
            AveragePublic = mean (ACS_PCT_PUBLIC_ONLY, na.rm = TRUE),
            AverageUninsured = mean (ACS_PCT_UNINSURED, na.rm = TRUE))
insurance_summary
# A tibble: 2 × 6
  STATE         AverageMedicaid AverageMedicare AveragePrivate AveragePublic
  <chr>                   <dbl>           <dbl>          <dbl>         <dbl>
1 Pennsylvania             16.8            5.28           64.7          23.1
2 West Virginia            22.9            6.70           54.3          31.7
# ℹ 1 more variable: AverageUninsured <dbl>
insurance_long <- insurance_summary %>%
  pivot_longer(
    cols = starts_with("Average"),    # Columns to pivot
    names_to = "Insurance",             # New column for race
    values_to = "Percentage"       # New column for values
  ) %>%
  mutate(Insurance = gsub("Mean_", "", Insurance))  # Clean up race names

insurance_graph <- ggplot(insurance_long, aes(x = Insurance, y = Percentage, fill = STATE)) +
  geom_bar(stat = "identity", position = "dodge") +  # Grouped bars
  theme_minimal() +
  labs(
    title = "Insurance Comparison by State",
    x = "Insurance Category",
    y = "Percentage (%)",
    fill = "State"
  ) +
  scale_fill_manual(values = c("Pennsylvania" = "blue", "West Virginia" = "red")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels

print(insurance_graph)

With the graph above, it looks like almost everyone in both West Virginia and Pennsylvania has insurance, more specifically private insurance and only a small percentage of the population are uninsured. This might correlates to the high percentage of “No” in diabetes prevalence because people with insurance are more likely to have routine checkups comparing to people who are uninsured.

Next, I would like to examine whether there are near-by clinics in both states as access to healthcare facillities may also influence individuals’ ability to receive routine checkups and diabetes management.

#Create color palette based on the white population
pal <- colorRampPalette(brewer.pal(9, "YlOrRd"))(100)

# Create the map using ggplot
ggplot(map_need) +
  geom_sf(aes(fill = POS_DIST_CLINIC_TRACT)) +  
  scale_fill_gradientn(colors = pal, name = "Distance from Clinic") +  
  theme_minimal() +
  labs(title = "Distance from Clinic by County in Pennsylvania and West Virginia")  

4 Results

The analysis shows that Pennsylvania have more income and more diverse Both West Virginia and Pennsylvania have high rates of insurance coverage with only a small percentage of the population uninsured. The higher income and insurance coverage is likely linked to more frequent routine checkups, which can improve diabetes management.

5 Conclusion

Section 4

In conclusion, this analysis highlights the significant role of social determinants, such as race, income, insurance coverage and access to healthcare shaping diabetes outcomes in both Pennsylvania and West Virginia. While higher income and insurance coverage in both states can associated with more frequent routine checkups, the racial and ethnic diversity in Pennsylvania may introduce additional complexities in healthcare access outcomes. The proximity of healthcare clinics also play a key role, with access varying across regions, highlighting the need for targeted interventions and address these factors to reduce health disparities and improve diabetes management for all populations.

Some limitations of this report include the fact that while the Social Determinants Database provides valuable information on social determinants of health, even at the county level, the BRFSS does not offer county-level data on diabetes prevalence. This limitation restricts the ability to perform a more granular analysis of diabetes rates across different regions within Pennsylvania and West Virginia. A more detailed county-level analysis would provide a clearer understanding of how diabetes prevalence may vary based on geographic location and population distribution, allowing for more targeted insights and recommendations.